That depends on what we're interested in showing.
Specific questions:
- Where were pizzas delivered? <- our focus
- When were pizzas delivered? Relative to poll-closing times?
Contextual question:
- What describes places where people wait on lines?
Some hypotheses:
- Longer lines in cities, densely populated areas
- Longer lines in states with close elections
- Longer lines in states without vote-by-mail
- More deliveries in democratic-leaning locations (Pizza to the polls network)
- Influence of vote suppression, voter ID laws
Deep exploration of the data is helpful in refining hypotheses.
We expect these effects to be non-uniform effects – complex models.
As of January 30, 2018, Colorado, Oregon, and Washington conducted all elections using a vote-by-mail system (via ballotpedia).
# NOT the elections package on cran - https://github.com/MEDSL/elections/blob/master/README.md
# make sure devtools is updated
# if (!require('devtools', quietly = TRUE)) install.packages('devtools')
# devtools::install_github('MEDSL/elections')
# The package makes available the following datasets:
# presidential_precincts_2016
# senate_precincts_2016
# house_precincts_2016
# state_precincts_2016
# local_precincts_2016
library(elections)
# Show percent dem by state for 2016 presidential election:
data("presidential_precincts_2016"); head(presidential_precincts_2016)
pres_by_state_returns_2016 = presidential_precincts_2016 %>%
group_by(state_postal, party) %>%
summarize(party_votes = sum(votes))
state_2party = left_join(pres_by_state_returns_2016 %>% filter(grepl(party, pattern="[D|d]emocrat")),
pres_by_state_returns_2016 %>% filter(grepl(party, pattern="[R|r]epublican")), by = "state_postal")
state_2party = state_2party %>%
group_by(state_postal) %>%
summarize(votes.dem = sum(party_votes.x), votes.rep = sum(party_votes.y))
us.state.dem = left_join(us.state, state_2party, by = c("state_abbr" = "state_postal")) %>%
mutate(percent_dem = votes.dem/(votes.dem + votes.rep))
plot3 = ggplot(data = pizza.grouped, aes(x = lon, y = lat, size = Pizzas_delivered)) +
ylim(24,75) + xlim(-175, -67) +
geom_sf(data = us.state.dem, aes(fill = percent_dem), inherit.aes = FALSE) +
scale_fill_gradient(low = "white", high = "black", limits = c(0,1)) +
geom_point(color = "red", alpha = .5) +
ggtitle("Delivery locations (By unique polling places -- some overlap)")
plot3
# Show percent dem by congressional district for 2016 presidential election:
data("house_precincts_2016"); head(house_precincts_2016)
head(us_congressional())
# lm ----
#polling places listed by state
View(pizza.grouped %>% group_by(state) %>% summarize(n()))
View(filter(pizza.grouped, Status == "Delivered") %>% group_by(state) %>% summarize(n()))
View(filter(pizza.grouped, Status == "Delivered") %>% group_by(state) %>% summarize(sum(Pizzas_delivered)))
pizza.grouped.lm = left_join(pizza.grouped, us.state.dem[,c("state_abbr", "percent_dem")], by = c("state"= "state_abbr")) %>% ungroup() %>% filter(!is.na(percent_dem), Status == "Delivered")
lm1 = lm(Pizzas_delivered ~ percent_dem + state, data = pizza.grouped.lm)
summary(lm1)